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Model-based parser generators decouple language specification from language processing. The 
model-driven approach avoids the limitations that conventional parser generators impose on the 
language designer. Conventional tools require the designed language grammar to conform to the 
specific kind of grammar supported by the particular parser generator (being LL and LR parser 
generators the most common). Model-driven parser generators, like ModelCC, do not require 
a grammar specification, since that grammar can be automatically derived from the language 
model and, if needed, adapted to conform to the requirements of the given kind of parser, all of 
this without interfering with the conceptual design of the language and its associated applications. 
Moreover, model-driven tools such as ModelCC are able to automatically resolve references be- 
tween language elements, hence producing abstract syntax graphs instead of abstract syntax trees 
as the result of the parsing process. Such graphs are not confined to directed acyclic graphs and 
they can contain cycles, since ModelCC supports anaphoric, cataphoric, and recursive references. 



I. INTRODUCTION 

A formal language represents a set of strings ^ . For- 
mal languages consist of an alphabet, which describes 
the basic symbol or character set of the language, and a 
grammar, which describes how to write valid sentences 
of the language [1, 01 ■ In Computer Science, formal lan- 
guages are used, among other things, for the precise def- 
inition of data formats and the syntax of programming 
languages. 

Most existing language specification techniques [l[ re- 
quire the language designer to provide a textual speci- 
fication of the language grammar. The proper specifi- 
cation of such a grammar is a nontrivial process that 
depends on the lexical and syntax analysis techniques 
to be used, since each kind of technique requires the 
grammar to comply with a specific set of constraints. 
Each analysis technique is characterized by its expres- 
sion power and this expression power determines whether 
a given analysis technique is suitable for a particular lan- 
guage. The most significant constraints on formal lan- 
guage specification originate from the need to consider 
context-sensitivity, the need to perform an efficient anal- 
ysis, and some techniques' inability to resolve confiicts 
caused by grammar ambiguities. 

As an alternative approach, model-based language 
specification techniques [7j decouple language design 
from language processing and automatically generate the 
corresponding language grammar, thus making the lan- 
guage design process less arduous. 

While, in general, the result of the parsing process is an 
abstract syntax tree that corresponds to a valid parsing of 
the input text according to the language concrete syntax, 
nothing prevents the model-based language designer from 
modeling non-tree structures. 

Typically, syntax analysis defers some analysis tasks 
to later stages in the language processing pipeline, such 



as reference resolution and other semantic checks. How- 
ever, a model-driven parser generator can be employed 
to automate some parts of this process. 

ModelCC P is a model-based parser generator that 
includes support for dealing with references between lan- 
guage elements, thus incorporating the reference resolu- 
tion that is traditionally hand-crafted with the help of a 
symbol table into the parsing process. 

In this paper, we explain how ModelCC \^ is able to 
resolve references and obtain abstract syntax graphs as 
the result of the parsing process, rather than the tradi- 
tional abstract syntax trees obtained from conventional 
parser generators. 

Section [IT] introduces model-based language specifica- 
tion. Section lnll cxplains the reference resolution support 
in the ModelCC model-based parser generator. Section 
IIVI includes a case study that illustrates abstract syntax 
graph parsing. Finally, section |V] presents our conclu- 
sions and future work. 



II. BACKGROUND 

In its most general sense, a model is anything used 
in any way to represent something else. In such sense, 
a grammar is a model of the language it defines. In 
Software Engineering, data models are also common. 
Data models explicitly determine the structure of data. 
Roughly speaking, they describe the elements they repre- 
sent and the relationships existing among them. From a 
formal point of view, it should be noted that data mod- 
els and grammar-based language specifications are not 
equivalent, even though both of them can be used to 
represent data structures. A data model can express re- 
lationships a grammar-based language specification can- 
not. A data model does not need to comply with the con- 
straints a grammar-based language specification has to 
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comply with. Typically, describing a data model is gen- 
erally easier than describing the corresponding grammar- 
based language specification. 

In practice, when we want to build a complex data 
structure from the contents of a file, the implementation 
of the mandatory language processor needed to parse the 
file requires the software engineer to build a grammar- 
based language specification for the data as represented 
in the file and also to implement the conversion from 
the parse tree returned by the parser to the desired data 
structure, which is an instance of the data model that 
describes the data in the file. 

Whenever the language specification has to be mod- 
ified, the language designer has to manually propagate 
changes throughout the entire language processor tool 
chain, from the specification of the grammar defining the 
formal language (and its adaptation to specific parsing 
tools) to the corresponding data model. These updates 
are time-consuming, tedious, and error-prone. As these 
changes are labor-intensive, the traditional language pro- 
cessing approach hampers the maintainability and evo- 
lution of the language used to represent the data @. 

Moreover, it is not uncommon for different applications 
to use the same language. For example, the compiler, 
different code generators, and other tools such as IDE 
editor or debugger, typically need to grapple with the full 
syntax of a programming language. Unfortunately, their 
maintenance typically requires keeping several copies of 
the same language specification in sync. 

The idea behind model-based language specification is 
that, starting from a single abstract syntax model (ASM) 
that represents the core concepts in a language, language 
designers can develop one or several concrete syntax mod- 
els (CSMs). These CSMs can suit the specific needs of the 
desired textual or graphical representation. The ASM- 
CSM mapping can be performed, for instance, by an- 
notating the abstract syntax model with the constraints 
needed to transform the elements in the abstract syntax 
into their concrete representation. 

This way, the ASM representing the language can be 
modified as needed without having to worry about the 
language processor and the peculiarities of the chosen 
parsing technique, since the corresponding language pro- 
cessor will be automatically updated. 

Finally, as the ASM is not bound to a particular 
parsing technique, evaluating alternative and/or comple- 
mentary parsing techniques is possible without having 
to propagate their constraints into the language model. 
Therefore, by using an annotated ASM, model-based lan- 
guage specification completely decouples language spec- 
ification from language processing, which can be per- 
formed using whichever parsing techniques are suitable 
for the formal language implicitly defined by the abstract 
model and its concrete mapping. 

A diagram summarizing the traditional language de- 
sign process is shown in Figure [TJ whereas the corre- 
sponding diagram for the model-based approach is shown 
in Figure m 
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Figure 1 Traditional language processing. 
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Figure 2 Model-based language processing. 



It should be noted that ASMs may represent non-tree 
structures. Hence the use of the 'abstract syntax graph' 
term in Figure [2] 

ModelCC [S] is a parser generator that supports a 
model-based approach to the design of language process- 
ing systems. Its starting ASM is created by defining 
classes that represent language elements and establish- 
ing relationships among those elements. Once the ASM 
is established, constraints can be imposed over language 
elements and their relationships as annotations in order 
to produce the desired ASM-CSM mapping. 

The ASM is built on top of basic language elements, 
which can be viewed as the tokens in the model-driven 
specification of a language. ModelCC provides the nec- 
essary mechanisms to combine those basic elements into 
more complex language constructs, which correspond to 
the use of concatenation, selection, and repetition in the 
syntax-driven specification of languages. 

In ModelCC, the constraints imposed over ASMs to 
define a particular ASM-CSM mapping are declared as 
metadata annotations on the model itself. Now sup- 
ported by all the major programming platforms, meta- 
data annotations are often used in reflective program- 
ming and code generation 0. Table U summarizes the 
set of constraints supported by ModelCC for establishing 
ASM-CSM mappings between ASMs and their concrete 
representation in textual CSMs. 

When the ASM represents a tree-like structure, a 
model-based parser generator is equivalent to a tradi- 
tional grammar-based parser generator in terms of ex- 
pression power. When the ASM represents non-tree 
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Constraints on... 


Annotation 


Function 


JT cLL LtJl lio 


©Pattern 


Pattern matching definition of basic language elements. 


©Value 


Field where the recognized input element will be stored. 




©Prefix 


Element prefix (es). 


Delimiters 


©Suffix 


Element suffix(es). 




©Separator 


Element separator(s). 




©Optional 


Optional elements. 


Cardinality 


©Minimum 


Minimum element multiplicity. 




©Maximum 


Maximum element multiplicity. 


Evaluation 
order 


©Associativity 
©Composition 


Element associativity (e.g. left-to-right). 

Eager or lazy composition for nested composites. 




©Priority 


Element precedence. 



Table I Summary of the basic metadata annotations supported by ModelCC. 



structures, reference resolution techniques can be em- 
ployed to make model-based parser generators more pow- 
erful than grammar-based ones, as we will see in the next 
Section. 



III. REFERENCE RESOLUTION SUPPORT IN 
MODELCC 

Reference resolution consists of finding the object a ref- 
erence refers to and, in the case of ModelCC, automat- 
ically linking the reference to the corresponding object 
instantiation. This resolution process is what leads to 
abstract syntax graphs instead of trees in model-driven 
language processing. 

In ModelCC, an object reference is embodied by a sub- 
set of the elements in its full object definition. This sub- 
set of elements acts as an identifier (or key in database 
terms) that, when found in the input text, can be rec- 
ognized as a reference to the corresponding object in the 
model and linked to its instantiation in the ASM. 

References in ModelCC can be anaphoric, when they 
are preceded by the corresponding object definition, but 
also cataphoric, when the references precede the defini- 
tion, and even recursive, when they appear within the 
definition they refer to. 

Subsection IIIL Al introduces the &ID metadata anno- 
tation, which allows the specification of identifiers for 
language elements. Subsection IIII.BI presents the @Ref- 
erence annotation, which allows the specification of ref- 
erences to other language elements. 
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Figure 3 ModelCC specification of Messages, their senders, 
and their receivers. 



of the corresponding language. That is, any appearance 
of the same set of values will be interpreted as a reference 
to the same instance of the referred language element. 

The use of references is resolved in our implementation 
of ModelCC by the introduction of grammar productions 
that characterize such references and semantic actions 
that map them to the corresponding language elements. 

In Figure [3l the (§ID annotation is employed to iden- 
tify users by a single number. 

It should be noted that the @ID annotation is incom- 
patible with the ©Optional ModelCC annotation, as null 
language element identifiers are not allowed, for the same 
reasons that attributes in a primary key are not nullable 
in a relational database. 

However, the @ID annotation can be used together 
with other ModelCC annotations, such as @FreeOrder, 
which allows the members of a language element to be 
shuffied in their textual representation, and ©Prefix and 
©Suffix, which add syntactic sugar to the incarnation of 
the abstract syntax model as a concrete textual language. 

The inadvertent definition of two entities of the same 
class with the same identifier results in a runtime warning 
produced by ModelCC when parsing its input. 



A. The ©ID Annotation 

ModelCC uses an @ID metadata annotation to sup- 
port reference specification. This annotation is applied 
to a subset of the members of a language element model. 
This subset determines the syntax of references to par- 
ticular instances of such elements in the concrete syntax 



B. The ©Reference annotation 

ModelCC resorts to the ©Reference metadata annota- 
tion to complete its support for reference resolution. The 
©Reference annotation applies to individual members of 
any language element, provided that the referenced types 
contain at least one ©ZD-annotated member in their Ian- 
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guage model. 

Whenever a language element member is annotated 
with ©Reference, the corresponding grammar produc- 
tions are modified so that they refer to the symbol cor- 
responding to the element reference specification rather 
than the symbol that corresponds to its full specification. 
These productions are then associated to a semantic ac- 
tion that resolves the references at the end of the parsing 
process, in order to support cataphoric and recursive ref- 
erences, apart from the anaphoric references that could 
be resolved on the fly during the parsing process. 

In Figure [31 the textual syntax of messages includes 
numbers that, as identifiers, refer to particular users. 
ModelCC will parse such identifiers, recognize the refer- 
ences, resolve them, and return the correct object graph. 

IV. A WORKING CASE STUDY 

In this section, we present an example language that 
allows the specification and rendering of complex 3D ob- 
jects using the reference resolution capabilities of Mod- 
elCC. 

First, we will outline the features we wish to include 
in our 3D object specification language. Then, we will 
provide the full language specification for ModelCC by 
defining an abstract syntax model, which will be anno- 
tated to specify the desired ASM-CSM mapping. Lastly, 
we will see some examples of input and output pairs for 
our 3D object specification language. 

A. Language Description 

Our 3D object specification language is designed to 
support the following features: 

• A special section, denoted by the "scene" keyword, 
delimits the statements that will be used for ren- 
dering the scene. 

• The definition of custom objects, which are identi- 
fied by an object name. As references can be lazily 
resolved, recursion is allowed. 

• Scoped statements, delimited by "{" and "}" , that 
allow the specification of lists of statements that 
will run sequentially in a new OpenGL scope (that 
is, issuing a "glPushMatrix" before executing the 
statements and "glPopMatrix" after executing the 
statements). 

• Composite statements, delimited by "[" and "]", 
that allow the specification of lists of statements 
that will run sequentially, but without creating a 
new OpenGL scope. 

• Repeated statements that allow the repetition of 
a statement, a group of statements, or a block of 
statements, a specific number of times. 



• Object statements, which draws either basic ob- 
jects (e.g. a cube) or user-defined objects. Draw 
statements allow the specification of a numeric pa- 
rameter. The "next" keyword is replaced in run- 
time by the current parameter decreased by one, 
and draw statements will not run when the param- 
eter is 0. 

• State-machine OpenGL-like scale transformation 
statements, which support the specification of a 
combination of x, y, and z values in any order, or 
a single scaling factor that will be applied to the 
three axes. 

• State-machine OpenGL-like rotate transformation 
statements, which support the specification of the 
angle and a combination of x, y, and z axis values 
in any order. 

• State-machine OpenGL-like translate transforma- 
tion statements, that support the specification of a 
combination of x, y, and z values in any order. 

• State-machine color transformation statements, 
which support the specification of a combination 
of red, green, blue, and alpha values in any order, 
and allow either absolute (by default) or relative 
color adjustments. 



B. ModelCC Implementation 

In ModelCC, the abstract syntax model is designed 
first and then it is mapped to a concrete syntax model by 
imposing constraints by means of metadata annotations 
on the abstract syntax model. 

The resulting model can be processed by ModelCC to 
generate the corresponding parser. The UML class dia- 
gram in Figure m presents our annotated 3D object spec- 
ification language model. 

The reference support extension we propose in this pa- 
per can be observed in the Definition, ObjectName, and 
DefinedObject classes. The name member of the Defi- 
nition class is annotated with @ID, which means that a 
Definition instance can be identified by an ObjectName. 
Then, the ref member of a DefinedObject is annotated 
with ©Reference, which means that, in textual form, a 
DefinedObject can refer to a Definition by its Object- 
Name. ModelCC reference resolution allows references 
to be resolved during the parsing process and makes the 
implementation of a traditional symbol table unneces- 
sary. 

It should be noted that certain constraints cannot be 
expressed in the abstract syntax model. However, these 
constraints can be expressed as custom constraints us- 
ing the ©Constraint annotation. In our example, some 
statements corresponding to elements in our model, such 
as draw statements and repeat statements, will not ac- 
cept real values as parameters. These custom seman- 
tic constraints are implemented in the check Arguments () 
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Figure 4 ModelCC definition of a 3D object specification language. ModelCC reference resolution support is used to allow the 
specification of complex 3D objects in the Definition class. 
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define snail [ 
draw cube 
{ 

scale 0.3 
color blue 1 
repeat 6 times [ 

draw cube 

translate y 1 

rotate z 1 angle -5 

color relative alpha -0.06 

] 

} 

translate x 0.8 
rotate z 1 angle 10 
scale 0.98 
color relative 

red -0.05 green +0.05 alpha -0.008 
draw snail next 

] 

scene [ 

color red 1 
draw snail 400 

] 

Figure 5 Snail specification in our 3D object specification 
language. 




Figure 6 Different views of the snail specified by the input 
text shown in Figure (5] 



define helix [ 
{ 

scale X 0.4 z 0.4 

draw cube 

> 

{ 

rotate y 1 angle 45 
scale 0.4 

scale y 0.2 x 0.2 z 1.5 
repeat 10 times [ 
draw cube 

color relative alpha -0.08 
translate z -1 

] 

} 

translate y 1 
translate x -4 z -4 
rotate y 1 angle 6 
translate x 4 z 4 
draw helix next 

] 

scene [ 
{ 

rotate y 1 angle 90 color red 1 
translate x 4 z 4 draw helix 40 
} { 

rotate y 1 angle 180 color green 1 
translate x 4 z 4 draw helix 40 
} { 

rotate y 1 angle 270 color blue 1 
treoislate x 4 z 4 draw helix 40 
> { 

color red green blue 
translate x 4 z 4 draw helix 40 

> 

] 

Figure 7 Quadruple helix specification in our 3D object spec- 
ification language. 




Figure 8 Different views of the quadruple helix specified by 
the input text shown in[7l 
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methods of the language elements classes corresponding 
to those statements. 

ModelCC is able to automatically generate a grammar 
from the ASM defined by a class model and the ASM- 
CSM mapping defined as a set of metadata annotations 
on the class model. References in that grammar are au- 
tomatically resolved by ModelCC so that further work is 
not needed. 



object specification language. 

In the future, we plan to apply model-based language 
specification techniques to problems such as data integra- 
tion. We also plan to implement metadata annotations 
that support more complex scoping rules for reference 
resolution. 



C. Examples of 3D Object Specification 

Figures [5] and ini illustrate the specification and render- 
ing of a 3D snail in our 3D object specification language. 
The snail object is defined as a single section of the snail 
consisting of the shell and a blue strip, and a slightly 
smaller, more transparent, and more greenish snail. The 
scene consists of a 400-section snail object. 

Figures [7] and [8] illustrate the specification and render- 
ing of a quadruple 3D helix in our 3D object specification 
language. The helix object is defined as a single section 
of a helix consisting of the outer part and a strip that 
grows more transparent until it reaches the axis, and a 
slightly rotated and translated helix. The scene consists 
of red, green, blue, and black 40-section helix objects. 

V. CONCLUSIONS AND FUTURE WORK 

ModelCC is a model-based parser generator that em- 
ploys metadata annotations to implement ASM-CSM 
mappings. 

We have described how ModelCC supports reference 
resolution and allows parsing abstract syntax graphs 
rather than conventional abstract syntax trees, as ob- 
tained by traditional grammar-driven parser generators. 

We have demonstrated the use of ModelCC refer- 
ence resolution support by designing and implementing 
a fully-functional abstract syntax graph parser for a 3D 
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